Development and validation of an ultrasound‑based radiomics nomogram to predict lymph node status in patients with high-grade serous ovarian cancer: a retrospective analysis

Background Despite advances in medical imaging technology, the accurate preoperative prediction of lymph node status remains challenging in ovarian cancer. This retrospective study aimed to investigate the feasibility of using ultrasound-based radiomics combined with preoperative clinical characteristics to predict lymph node metastasis (LNM) in patients with high-grade serous ovarian cancer (HGSOC). Results Patients with 401 HGSOC lesions from two institutions were enrolled: institution 1 for the training cohort (n = 322) and institution 2 for the external test cohort (n = 79). Radiomics features were extracted from the three preoperative ultrasound images of each lesion. During feature selection, primary screening was first performed using the sample variance F-value, followed by recursive feature elimination (RFE) to filter out the 12 most significant features for predicting LNM. The radscore derived from these 12 radiomic features and three clinical characteristics were used to construct a combined model and nomogram to predict LNM, and subsequent 10-fold cross-validation was performed. In the test phase, the three models were tested with external test cohort. The radiomics model had an area under the curve (AUC) of 0.899 (95% confidence interval [CI]: 0.864–0.933) in the training cohort and 0.855 (95%CI: 0.774–0.935) in the test cohort. The combined model showed good calibration and discrimination in the training cohort (AUC = 0.930) and test cohort (AUC = 0.881), which were superior to those of the radiomic and clinical models alone. Conclusions The nomogram consisting of the radscore and preoperative clinical characteristics showed good diagnostic performance in predicting LNM in patients with HGSOC. It may be used as a noninvasive method for assessing the lymph node status in these patients. Supplementary Information The online version contains supplementary material available at 10.1186/s13048-024-01375-7.

Development and validation of an ultrasound-based radiomics nomogram to predict lymph node status in patients with high-grade serous ovarian cancer: a retrospective analysis Background Ovarian cancer (OC) has the highest mortality rate among all gynecological malignancies [1].Epithelial ovarian cancer (EOC) accounts for more than 95% of all OC cases [2].Although considerable progress has been made in the diagnosis and treatment of EOC, its prognosis remains poor [3].High-grade serous ovarian cancer (HGSOC) accounts for approximately 60% of EOC cases [4].Most patients with HGSOC have advanced disease at the time of diagnosis, and their long-term survival rates are low [5,6].The International Federation of Gynecology and Obstetrics (FIGO) ovarian cancer staging system [7] does not include substaging of lymph nodes, except as a distant disease manifestation.However, it has been shown that lymph node metastasis (LNM) represents tumor infiltration and spread, the incidence of LNM is lower in early-than in late-stage disease.Lymph node status significantly affects the survival of patients with OC.Patients with LNM are usually classified as stage III or IV and have a poorer prognosis [8,9].
Currently, surgical and histopathological diagnosis is the gold standard for staging of EOC.According to the National Comprehensive Cancer Network guidelines [10], the resection of enlarged or suspicious lymph nodes on preoperative imaging or intraoperative exploration is recommended.However, there is significant controversy regarding the use of lymph node dissection for staging OC.Several studies [9,11] have demonstrated that systematic lymph node dissection does not provide any benefit, with no difference in progression-free survival or overall survival, and is associated with a higher incidence of complications.
Despite advances in medical imaging technology, the accurate preoperative prediction of lymph node status remains difficult.Computed Tomography (CT) with intravenous contrast is the first-line imaging method for staging and follow-up of OC according to the American College of Radiology guidelines [12].However, according to a meta-analysis, the sensitivity of CT for predicting LNM is not ideal, only 0.47 [13,14].The diagnostic efficacy of magnetic resonance imaging (MRI) [15] and positron emission tomography/computed tomography (PET/CT) [16] is also not high.The overlap between reactive hyperplastic and metastatic lymph nodes is the most common reason for false positives and false negatives [17].Therefore, it is necessary to explore methods for the preoperative prediction of LNM.
The essence of radiomics is to extract unrecognizable features from medical images and establish a relationship between these high-throughput features and a low-noise state [18,19].Currently, CT-and MRI-based radiomics have been applied for the individualized treatment of HGSOC [6,[20][21][22].Researchers are attempting to establish radiomic models based on ultrasonography [23].
To the best of our knowledge, no previous studies have explored ultrasound-based radiomics to predict LNM in patients with HGSOC to date.Therefore, the aim of this study was to explore the feasibility of predicting the lymph node status using preoperative ultrasound imaging-based radiomics in patients with HGSOC as well as to investigate whether preoperative clinical parameters can assist in predicting LNM.

Patients
We retrospectively reviewed 920 consecutive patients with HGSOC in two institutions (Institution 1: Shengjing Hospital of China Medical University; Institution 2: Huaxiang Hospital of Shengjing Hospital of China Medical University) from January 2017 to December 2021.All patients underwent comprehensive staging surgery with pelvic and para-aortic lymph node dissection.The inclusion criteria were as follows: (1) HGSOC diagnosed by postoperative pathology, (2) primary ovarian cancer, (3) ultrasound examination performed in our hospital within 3 weeks before surgery, (4) initial surgery, and (5) clear postoperative lymph node metastasis status.The exclusion criteria were as follows: (1) combination of other gynecological malignancies, (2) metastatic ovarian cancer, (3) preoperative adjuvant chemotherapy or radiotherapy, (4) unsatisfactory ultrasound images, and (5) incomplete clinical data.The endpoint event in this study was lymph node status determined by histopathologic findings after comprehensive staging surgery.Finally, our study included 401 eligible patients, with patients from institution 1 included in the training cohort (n = 322) and patients from institution 2 included in the test cohort (n = 79).A flowchart of the study is shown in Fig. 1.

Tumor segmentation and feature extraction
Preoperative ultrasound images of all patients with HGSOC were retrieved using an picture archiving and communication system (PACS).Images from the final preoperative ultrasound examination were selected.For HGSOC with bilateral progression, larger and more complex solid lesions were selected for analysis.Three standard images were selected for each lesion: the largest section, including the most complex lesion component; the section orthogonal to the largest section; and the color Doppler imaging section of the solid component of the lesion.All lesions were manually delineated by a radiologist with three years of experience in gynecologic imaging using the Darwin research platform (https:// arxiv.org/abs/2009.00908).All segments were confirmed by a senior radiologist with over 25 years of gynecological imaging experience, who was blinded to the pathological results corresponding to the images.If there was a difference between the two radiologists, the final region of interest (ROI) was confirmed through a discussion.
Data preprocessing is an important step in machine learning that can make the algorithm converge faster to obtain a more reasonable model.We used different ultrasound diagnostic instruments such as LOGIQ E9 (GE Co., NY, USA) and Mylab Class C (Esaote Co., Genoa, Italy) for ultrasound image acquisition.Therefore, before performing feature extraction, we normalized the original feature vector by subtracting the mean value from the extracted feature data and dividing it by the variance to minimize the differences caused by the different ultrasound instruments.

Feature selection and radiomic model development
First, we used the optimal feature filter (i.e., sample variance F-value) to evaluate the linear correlation between each feature and the category label, and filtered the top 10% of the most relevant features with the largest F-value from the 3375 features.Subsequently, because some machine learning models can evaluate the importance of features, the classifier is trained iteratively until the classification performance is optimal by removing the features with the lowest importance at the end of each training session.We used a recursive feature elimination (RFE) method based on logistic regression (LR) to train the model iteratively with STEP set to 1.The features with the lowest weights were removed each time, and the top 12 features were selected.Models consisting of fewer than 12 features did not improve the classification performance.
After the optimal subset of features was derived from the above two feature selection steps, we used five supervised machine learning methods to build the classifier in the training cohort: support vector machine (SVM), K-nearest neighbor (KNN), random forest (RF), decision tree (DT), and LR.For SVM, the radial basis function (RBF) was chosen as the kernel function to fit the data.For RF and DT, overfitting was prevented by limiting the minimum sample size of the leaf nodes and the maximum tree depth.For LR, L1 regularization was used as a penalty.10-fold cross-validation was performed for each classifier.The average area under the receiver operating characteristic (ROC) curve (AUC) and average sensitivity, specificity, and accuracy were provided as performance metrics for the cross-validation cohort.The classifier with the highest mean AUC was selected.Finally, the radscore for each patient was calculated according to a linear model based on LR and the radiomics model was constructed based on radscore.

Establishment of the clinical and combined models
Radiomics can be used to extract high-dimensional features from images.However, owing to the heterogeneity of ultrasound images, some features closely related to the disease are equally relevant for predicting LNM, such as the lesion size, unilateral or bilateral involvement, presence of ultrasonography (US)-reported pelvic fluid, presence of US-reported peritoneal thickening, and presence of US-reported pelvic wall nodules.We recorded the above information from the US report and collected clinical data from the Hospital Information System (HIS), including age, menopausal status, and preoperative serological indicators (cancer antigen 125 (CA125) levels, human epididymal protein 4 (HE4) levels, carcinoembryonic antigen (CEA) levels, and cancer antigen 724 (CA724) levels).Lesion size, US-reported pelvic fluid, age, CA125, HE4, CEA, and CA724 were set as continuous variables.The other features were set as categorical variables.
We used the R language 'mlr3' package to construct a LR-based machine learning feature screener for clinical characteristics.A 10-fold cross-validation with 20 iterations was used to select the characteristics included in clinical model with the best AUC performance.A clinical model was developed using these clinical characteristics.To explore whether combining the radscore with the relevant clinical characteristics could further improve the predictive performance of the model, we combined the radscore and relevant clinical characteristics to build a multivariate logistic regression model and constructed a nomogram.

External test and evaluation of the models
Three models were applied to the external test cohorts.Decision curve analysis (DCA) was performed to illustrate the net clinical benefits derived from the three models.Calibration curves were used to assess the nomogram performance.The overall workflow of the radiomics model development and validation is displayed in Fig. 2.

Statistical analysis
All statistical analyses were performed using the R version 4.1.3(R Foundation for Statistical Computing, Vienna, Austria.URL https://www.R-project.org/.).The statistical significance level was set at 0.05.The chisquare test was used to compare categorical variables, and the Mann-Whitney U test was used to compare continuous variables.The diagnostic efficiency of the models was evaluated using ROC curves and quantified using the AUC.The sensitivity, specificity, and accuracy were calculated to quantify various aspects of the models' diagnostic ability.

Patient characteristics
Among the 401 patients included in this study, the mean age was 54.6 ± 8.71 years, 173 (43.1%) had postoperative pathologically confirmed LNM, and 228 (56.9%) had postoperative pathologically confirmed no-LNM; 65.6% of the patients were in the postmenopausal state.The mean maximum diameter of the lesion was 9.46 cm.Suspicious peritoneal thickening and pelvic wall nodules were detected on US in 25.9% and 20.7% of patients, respectively.There were no statistical differences in the clinical characteristics of the patients between the training and test cohorts (p > 0.01).The clinical characteristics of patients in the training and test cohorts are presented in Table 1.In the training cohort, we found significant differences (p < 0.05) between the LNM-negative and LNM-positive groups with regard to US-reported pelvic fluid, laterality, US-reported peritoneal thickening, US-reported pelvic wall nodules, CA125, HE4, CEA, and CA724 levels, with the LNM-positive group having a deeper pelvic fluid depth, greater odds of peritoneal thickening and pelvic wall nodules, and significantly higher serological indicators than the LNM-negative group.In contrast, only CA125, HE4 levels and USreported peritoneal thickening were significantly different between the two groups in the test cohort.

Feature selection and construction of the radiomic model
A total of 3375 features were extracted from 401 lesions, and 12 features that were highly correlated with LNM were selected using 2-step feature selection.To derive the optimal prediction model, we selected five machine learning algorithms for classifier construction in the training cohort and compared the performances of several classifiers using 10-fold cross-validation.The performance of the five classifiers is shown in Table 2.The LR model achieved better classification performance with a mean AUC, sensitivity, specificity, and accuracy of 0.876, 0.688, 0.860 and 0.789, respectively.
We derived the radscore for each patient from these 12 features using a linear model based on LR and then applied the radscores to build a radiomic model.The scoring formula and the radscores for each patient are presented in Table S1.There was a significant difference in the radscore between patients with and without LNM in the training cohort (0.69 ± 0.27 vs. 0.23 ± 0.21; p<0.05) and test cohort (0.74 ± 0.23 vs. 0.34 ± 0.29; p<0.05).The AUC value of the radiomic model based on the radscore was 0.930 (95% CI: 0.902-0.958) in the training cohort and 0.881 (95% CI: 0.801-0.954) in the test cohort.

Clinical model construction and evaluation
We filtered clinical characteristics using a LR-based machine learning feature filter.After 20 iterations of 10-fold cross-validation, CA125 and US-reported peritoneal thickening were identified as the variables for clinical model which AUC was 0.762.The AUC of clinical model for predicting LMN was 0.770 (95% CI: 0.719-0.822) in the training cohort and 0.735 (95% CI: 0.622-0.848) in the test cohort.

Combined model construction and evaluation
We performed LR using the independent clinical predictors and radscore and constructed a combined model.We compared the diagnostic performance of the radiomic, clinical, and combined models.Table 3; Fig. 3a, b show the sensitivity, specificity, accuracy, and AUC of the three models in the training and test cohorts.We observed that the AUC of the combined model improved from 0.899 (95%CI: 0.864-0.933)to 0.930 (95% CI: 0.902-0.958) in the training cohort and from 0.855 (95% CI: 0.774-0.935)to 0.881 (95% CI: 0.801-0.954) in the test cohort.DCA showed that the combined model had a higher overall net benefit at the threshold probability (Fig. 3c, d).
The combined model was then used to construct a nomogram (Fig. 4a).Calibration curves of the combined model are shown in Fig. 4b, c.The alignment of the dashed and solid lines indicates a good agreement between the predicted results of LMN and the true state in the training and test cohorts.

Discussion
In our study, we constructed a prediction model to predict LNM in HGSOC.We used three preoperative ultrasound images of patients with OC to identify radiomic features and calculate radscore.A nomogram was created using radscore, serologic CA125 levels, and US reported peritoneal thickening.This nomogram can predict the probability of LNM in patients with ovarian Since the development of radiomics, the relationship between the high-throughput information embedded in the images and the biological behavior of the disease has been the focus of research.We believe that the highly aggressive and metastatic tendencies during tumor development cause changes in the imaging presentation that are difficult to observe with the naked eye during the early stages.In previous studies, researchers tended to look for direct imaging signs of metastatic lymph nodes, such as an oval shape and disappearance of lymphatic portals, while ignoring the features of the tumor itself [13].Many histological micrometastases may not be morphologically altered, whereas reactive hyperplastic lymph nodes may exhibit changes in size and morphology.Studies have shown that radiomics based on primary   lesions can identify LNM in cervical cancer [23].The prediction of LNM based on imaging features of the primary tumor is in an exploratory stage.
In this study, we extracted a large number of radiomic features from US images.During feature selection, primary screening was first performed using the sample variance F-value, followed by RFE to filter out the 12 most important and stable features for predicting LNM.RFE has been increasingly adopted as a feature selection method to obtain key combinations of variables that maximize the model performance by adding or removing specific feature variables [25].We used radiomic features to construct different machine learning classifiers.The AUCs of the five classifiers in the test cohort ranged from 0.774 to 0.876.As a linear regression method, LR allows for the output of probabilities for binary classification Previously, radiomics based on CT and MRI have been used to predict metastasis in ovarian cancer [20,21,26].Some researchers have used CT and PET to predict pelvic and/or para-aortic LNM in patients with advanced EOC, and the specificity of the obtained radiomic model for predicting high-risk lymph nodes was reportedly 78.3% [27].However, the sample size of this study was small and the credibility of the conclusions is speculative.Yao et al. [28] developed a model for predicting the lymph node status based on PET images of patients with ovarian cancer using residual neural networks and SVM for modeling.Their model had an AUC of 0.92 in the test cohort, but the model only included patients with earlystage ovarian cancer.However, most patients are already in an advanced stage at diagnosis.The clinical stage of OC was not limited in our study, which may have greater clinical applicability.
In addition to preoperative imaging, serum tumor markers are measured in patients with suspected ovarian cancer.Serum CA-125 and HE4 levels are considered clinical predictors of survival and treatment response in patients with EOC, but there is no conclusive evidence on whether serum tumor markers are predictive of LNM, and they vary widely between different study populations [29,30].Zhou et al. [31] found that preoperative serum CA125 level > 740 U/mL was a risk factor for LNM in patients with EOC.It has also been suggested that CA125 levels are not associated with LNM in early OC [32].Increased HE4 levels promote ovarian cancer cell invasion and metastasis through certain signaling pathways [33].However, in our analysis, there may be a role for serum CA125 levels in predicting lymph node metastasis in OC, which is consistent with some studies.While HE4, CA724 and CEA levels were excluded in our clinical model.Previous studies have investigated the predictors of LNM in OC and concluded that high-grade serous tumors, positive peritoneal cytology, advanced clinical stage, interval surgery, and bilateral adnexal involvement can predict LNM in patients with OC [34,35].This is consistent with our conclusion that peritoneal thickening on US images correlates with LNM.
However, our study has some limitations.First, our study included only patients with HGSOC and did not include other pathological subtypes, which limited the extrapolability of the model.For most patients with suspected ovarian cancer, a puncture biopsy of the lesion is performed for better treatment planning; therefore, most physicians already know the pathological type before surgery.However, we will continue training the model so that it can be applied to all pathological types of OC.Second, since this was a single-center retrospective study, the sample size needs to be improved.Whether the model can be applied to other hospitals and physicians with different levels of seniority remains to be investigated.Third, unlike most CT-and MRI-based radiomic studies, not all US images were acquired using the same US instrument, and different instrument parameter settings may have led to feature heterogeneity.This is because the widespread prevalence of US makes it impossible to perform US examinations in all gynecological patients in large general hospitals using the same instrument model.In this regard, we normalized the US images to make the distribution of each dimension similar in order to speed up model convergence and improve the model accuracy.

Conclusions
In conclusion, we successfully developed a radiomic model based on preoperative US images and clinical characteristics and established a nomogram that can predict LNM more accurately in patients with HGSOC.Using this model, clinicians can decide whether to perform extensive lymph node dissection in patients with HGSOC, thereby avoiding the adverse effects of unnecessary lymph node dissection.In the future, we will incorporate more pathological types of ovarian cancer, increase the sample size, perform external validation across multiple hospitals, perform prospective validation to test the model, and develop US-based radiomics for patients with all stages of ovarian cancer to improve the diagnosis and prognosis.

Fig. 1
Fig. 1 Flowchart of the inclusion and exclusion criteria for patients

Fig. 2
Fig. 2 Workflow of this study

Fig. 3
Fig. 3 Predictive performance of the radiomic, clinical and combined models in the training and test cohorts.(a, b) show the ROC curves of the different models in the training and test cohorts.Decision curve analysis (c, d) illustrates the net clinical benefits of the prediction model.The y-axis represents the net benefit and x-axis represents the threshold probability.The blue line indicates "treat all" and the pink horizontal line denotes "treat none." ROC, receiver operating characteristic curve

Fig. 4
Fig. 4 (a) Nomogram for predicting lymph node metastasis of HGSOC based on radscore and clinical characteristics.In the nomogram, a vertical line was first made according to the Radscore to determine the corresponding value of points.Similarly, the CA125 and US-reported peritoneal thickening values were also determined.The total points were the sum of the three points above.Finally, a vertical line was made according to the value of the total points to determine the probability of LNM.(b, c) show the calibration curves of the nomogram developed in the training and test cohorts.HGSOC, high-grade serous ovarian cancer

Table 2
Diagnostic efficiency of different classifiers in the training and test cohorts Abbreviations SVM, support vector machine; KNN, K-nearest neighbor; RF, random forest; DT, decision tree; LR, logistic regression; AUC, area under the curve; CI, confidence interval; SEN, sensitivity; SPE, specificity; ACC, accuracy

Table 3
Diagnostic efficiency of the clinical, radiomic, and combined models in the training and test cohorts